Download a PDF of this article: Part 1, 2, 3
An ATM router design is not something to take lightly under the best of circumstances. When the router must work intimately with RF links to its clients, provide service at up to 100 Mbits/second to each client and achieve an aggregate bandwidth of 6 Gbits/s, the challenges only get more formidable. Clearly this will be a design with a large number of sophisticated ASICs.
But add in one more detail: The whole assembly -- RF, modems and router -- will be living in a geosynchronous orbit, exposed to continuous radiation and beyond the reach of any conceivable repair or field upgrade. Now you have an interesting chip design.
This is the design that TRW's Astrolink payload team undertook. Altogether, the satellite payload required 10 different ASIC designs, four of which have more than 1 million gates and one of which -- the channelizer/demodulator chip that turns a digitized bit stream from the antenna amplifiers into a stream of ATM packets -- is a 5.5 million-gate monster.
All 10 chips were completed in the 19-month project window, and all the chips worked on the first pass -- a tribute to a methodology tuned to the most unforgiving of environments. The chips were developed by TRW in conjunction with ASIC vendor Fujitsu Microelectronics in that company's 0.25-micron, four-metal CMOS CE-71 and CS-71 processes.
To set the scale for the individual chip designs, consider the largest of the devices, the demodulator. The 5.5 million-gate chip takes in a bit stream that has been downconverted to 125 MHz and digitized by an external A/D converter. This stream must be demodulated to extract data, and then further processed to separate out the individual data channels from aTDMA signal. From these signals control circuitry must inspect the data for "entry probes,” by which ground terminals request services. In addition, the control circuitry must create reports for the higher-level operation of the router. Finally,the remaining data must be assembled into a single stream of ATM cells and sent on to the switch fabric in other chips. Each RF channel requires one of these devices, and each chip has the equivalent of 34 gigaoperations per second of processing power.
The challenges of space
On top of what was already a sophisticated design undertaking, the fact that the ASICs will be operating in high earth orbit imposed another whole set of issues. Probably the problem that comes to mind first is radiation. In a geosynchronous orbit, the Astrolink satellite is outside most of the radiation shielding provided by the earth. The designers had to expect that radiation would cause logic states to change spontaneously within the digital circuitry -- so-called single-event upsets (SEUs).
The traditional approach to dealing with this problem in the days of big NASA budgets was radiation-hardened circuit designs. You simply made the transistors and flip-flops so robust that a single Alpha particle couldn't do that much damage. But in the age of commercial off-the-shelf processes and commercial-size budgets, using a rad-hard or S-class process was
not a serious option. The design team had to find other ways to tolerate SEUs.
This load fell not on the library designers at Fujitsu, but on the RTL design team. Massive redundancy, in which whole sections of logic would be duplicated and would vote on the correct result, were theoretically possible, but would have exceeded the density range
of the quarter-micron process. So instead,the designers used selected redundancy at the circuit level, combined with RTL structures that were inherently resistant to upset. This meant that the design was using standard libraries, but that additional complexity was present at the RT and gate levels.
Verification becomes an issue
Orbital operation had another implication, however, that would prove even more pervasive than the radiation problem. That was the simple fact that once the payload was launched, there would be no way to correct design errors. The chips had to be correct -- not just good
enough to ship the first prototype run, but correct enough to last for the entire product life.
This put an enormous burden on the verification team, which already faced the most complex design ever placed into a payload. Not only would verification have to prove that the design had been implemented correctly, but it would have to confirm operational specification such as bit error rate and cell loss rate from expected RF waveforms before the chips could tape out.
Faced with this challenge,the design team selected a four-pronged approach. TRW used conventional simulation at all levels to examine the detailed behavior of the design. The team developed simulation models of the data-conversion process, so simulation could be stimulated by waveform data rather than simply by bit streams. In addition, they employed Verplex formal verification tools at each step in the development -- RTL to gate level, after BIST insertion and so on -- to screen for the introduction of errors.
But the most innovative use of verification came from two additional tools: Ikos acceleration and Aptix emulation. The team used the Ikos system for postlayout simulation of the chips, primarily to confirm equivalence of the laid-out circuitry to the functional requirements. Given the enormous amount of data required to define a transaction through the router, the speed provided by the Ikos system was essential.About 10 days of simulation run-time was reduced to a few hours of Ikos time.
Yet even that was insufficient acceleration for the huge data sets needed to evaluate the system's response to second-order effects. This required a model from RF in to RF out of two demodulation channels and the entire router. For that, TRW created what is probably the world's largest assembly of Aptix emulation boards to create a real-time, full-path emulation of the payload.
The emulation required a total of 16 Aptix boards, lashed together through proprietary interconnect circuitry designed by the TRW team. Since the team needed to evaluate the design with actual RF going in and coming out, there was no practical way around having the emulation run in real-time. Any less would have required the waveform data to be digitized and buffered,
and then the buffer sizes would have limited the length of test cases, introducing the possibility that some serious fault would be untested.
As it turned out, Aptix's use of state-of-market commercial FPGAs was a key factor in achieving real-time. The FPGAs -- initially Xilinx 4020 devices, gradually replaced by Virtex devices as the project progressed -- could run individual blocks of the design in real-time. But the links between the boards were limited to about 30 MHz, even with TRW's additional circuitry. So the design had to be partitioned in so that modules could fit into a single Aptix board and require less than 30-MHz bandwidth to other modules.
The Aptix system was designed so that it could either be fed synthetic bit streams or live
RF. This gave the team not only the ability to test large waveforms without having to
store them somewhere, but also the ability to correlate behavior from the two different sources. It also made it possible to inject RF waveforms with expected noise patterns and measure bit error rate and cell loss rate directly at the demodulator emulation. From this data, the
team could not only verify correct behavior of the logic, but could demonstrate that the total system was meeting its performance goals prior to launch.
In addition, the Aptix model 's ability to run in real-time proved vital to the software development efforts, allowing them to stay on schedule and current with the hardware design.
Partitioning as magic
The topology of the Aptix systems set an upper bound of sorts on the system partitioning. Yet the team was determined not to let the tools dictate the chip partitioning. Another factor, however, did prove dictatorial.
Generally, the proportion of memory in a system-level chip increases with the density. Chips in the 5 million-gate range would normally be mostly memory, with perhaps 30 or 40 percent logic. This bails out the design tools, which otherwise might not be able to cope with
enormous blocks of logic.
Unfortunately, TRW's demodulator chip didn't follow the pattern -- it was only about 40 percent RAM. That meant about 125,000 instances of flip-flops, making the design far too large for the place and route tools. The TRW designers found that they would have to do a hierarchical place and route of individual blocks, and then wire the blocks together in a separate step. Each of the blocks would be a little ASIC in its own right, synthesized, clock and BIST inserted, placed and routed in isolation.
This approach caused one of the few surprises in the design flow. While the place and route tools were fine with a hierarchical design, the BIST insertion tool worked on a flattened netlist. This caused severe routing problems after the BIST circuitry had been inserted. The team eventually acquired a BIST insertion tool that could handle the design in a hierarchical manner.
The approach also put some tight constraints on the layout of the individual blocks. They had to be small enough to make timing closure practical, yet large enough so that each block could be a relatively self-contained function, minimizing bandwidth to other blocks. This helped both with the Aptix emulation and with the process of wiring the blocks together in the global route.
To make sure that the global routing didn't mess up the block-level timing closure too much, each block's inputs and outputs were registered, making its timing relatively independent of the rest of the chip.
Finally, curiously enough, each block had to be kept nearly square. In the four-metal process, the router, as you would expect,used two layers for X routes and two layers for Y routes. If the block's aspect ratio diverged too far from square, two of the routing layers would be
underutilized and two would become congested.
A moral at the end
When the dust cleared, TRW had completed the 10 ASICs in the payload in 19 months,with first-time success on each chip. A good deal of the credit can be given to the close working relationship between design team and ASIC vendor, and to the highly rigorous, system-down
verification methodology that the team employed from the beginning. But another point is worth making.
Given the complexity of the total payload design and the size of the multicompany, multidisciplinary design team, the effort was highly structured from day one. The
design started with formal requirements that flowed down the process as the system was partitioned,implemented and verified. To some designers,this is a characteristically “military” kind of design culture. But the experience at TRW underlines that such formal structure is not particular to any industry or community: It is simply necessary for first-time success on designs of such complexity.